In silico analysis of suitable signal peptides for secretion of a recombinant alcohol dehydrogenase with a key role in atorvastatin enzymatic synthesis

An elevated cholesterol level might lead to cardiovascular disease (CVD). Statins block the cholesterol synthesis pathway in the liver. Atorvastatin is the most widespread statin worldwide and, its chemical synthesis requires toxic catalysts, resulting in environmental pollution. Hence, enzymatic synthesis of atorvastatin is desirable. This process could be done by Lactobacillus kefir alcohol dehydrogenase (LKADH). Therefore, recombinant enzyme secretion by Escherichia coli using signal peptides (SPs) might result in easy production and purification. To achieve this objective, we used some online bioinformatics web servers to evaluate the suitable SPs for translocation of LKADH into extracellular spaces. “Signal Peptide Website” and “UniProt” were utilized to retrieve the SPs and LKADH sequences. “SignalP 4.1” was used to determine SPs and their cleavage site location and the results were rechecked by “Philius”. Physicochemical features of SPs were evaluated by “ProtParam”, then solubility of their fusion with LKADH was assessed by “Protein-sol”. Finally, secretion pathway and sub-cellular localization of the selected stable and soluble LKADH fusions were predicted by “PRED-TAT” and “ProtCompB”. Amongst the 41 evaluated SPs, only LPTA_ECOLI, SUBF_BACSU, CHIS_BACSU, SACB_BACAM, CDGT_BACST and AMY_BACLI could translocate LKADH out of cytoplasm. The six selected SPs in the result section were suitable to design a soluble secretory LKADH that accelerate its scale-up production and might be useful in future experimental researches.


INTRODUCTION
Cardiovascular disease (CVD) is one of the major cause of morbidity and mortality worldwide [1]. Increased level of low-density lipoprotein cholesterol (LDL-C) is considered as the main risk factor for CVD. 3-hydroxy-3-methylglutaryl-coenzyme A (HMG-CoA) reductase inhibitors ('statins') were introduced since the late 1980s to prevent cardiovascular disease. Statins are potent inhibitors for HMG-CoA reductase, which block the cholesterol synthesis pathway in the liver and reduce major cardiovascular events [2]. It is proven that statins might interfere with other biological pathways as well as having several potential therapeutic effects [3]. Atorvastatin is a type of statin that reduces LDL cholesterol, which results in mortality rate reduction due to coronary heart diseases.
Lipitor® (atorvastatin calcium) is one of the best-selling drugs in the world. The side chain of atorvastatin has two chiral cores that their synthesis is a critical step in atorvastatin synthesis process [4,5]. It was revealed that the chemical synthesis of atorvastatin needs expensive catalysts, causing extreme environmental pollutions. Therefore, applicable enzymes can reduce costs and prevent toxicity in the environment. Lactobacillus kefir alcohol dehydrogenase (LKADH) was introduced in 1990 by Hummel et al. LKADH is shown to be a beneficial enzyme for the industrial production of atorvastatin since it can act as a suitable enzyme for synthesizing the side chain of atorvastatin by reducing tert-butyl 6-chloro-3, 5-dioxohexanoate to tert-Butyl (S)-6-chloro-5-hydroxy-3-oxohexanoate, using nicotinamide adenine dinucleotide phosphate (NADPH) in a fed-batch system. NADPH is an expensive reagent, which is the limiting factor in the process of atorvastatin side chain production by LKADH; hence, its regeneration has significant importance, especially in industrial scales. On the other hand, LKADH as a single-enzyme system can effectively regenerate NADPH using cost-efficient solvent like ethanol along with the synthesis of the product. This binary function is considered as a significant advantage [6,7].
Recombinant DNA technology can help the industrial production of proteins by reducing the cost and increasing the efficacy of the bioprocesses. Escherichia coli (E.coli) is one of the most widespread expression hosts that can produce heterologous recombinant proteins [8]. High-level expression of recombinant proteins in E.coli can result in high amount aggregation of insoluble misfolded proteins in the cytoplasm, which is considered as the inclusion bodies. These aggregated intermediates are unable to get a suitable biological activity. Hence, the inclusion bodies has to be refolded to get the appropriate soluble proteins. [9]. One solution is to synthesize secretory recombinant proteins, which excrete into the periplasmic space or culture medium. Preventing protease attack, facilitated purification, correct formation of disulfide bonds and accurate protein folding are the advantages of the secretory production of recombinant proteins in comparison with cytoplasmic expression [10].
Considering the secretory production of recombinant proteins, they can be guided to the periplasmic space or culture medium by fusing suitable signal sequences to their N-terminus. There are several common translocation systems in the E.coli including Sec system, signal recognition particle-dependent (SRP-dependent) pathway, and twin-arginine translocation (TAT) system. Hence, it is possible to increase the efficiency of a translocation system using alternative signal peptides (SPs) that might be obtained from some heterologous species. A significant increase in the protein production at a commercial level, is the result of using SPs. A SP has various motifs, necessary to target a specific protein in the extra-cytoplasmic spaces. It is located at N-terminus of immature desired protein and can be detached by signal (leader) peptidase. The length of SPs is usually 15-30 amino acids that includes three distinct regions. Generally n-region consists of 5-8 positively charged residues, h-region is composed of 8-12 hydrophobic residues and, c-region contains 5-7 polar residues, which include cleavage site location in carboxyl terminus [11].
Various computational approaches were applied to predict a suitable N-terminal SPs, and different bioinformatics tools were utilized to predict the SPs presence and their locations [12]. In the present study, several online web servers were used to investigate suitable SPs for secretory production of LKADH. To the best of our knowledge, we could not find any in silico studies for secretory production of LKADH.

MATERIALS AND METHODS
Dataset retrieval: "Signal Peptide Website" (http://www.signalpeptide.de/) was employed to retrieve 41 appropriate SPs. SPs were chosen according to several criteria. Selected SPs were marked as confirmed in the mentioned database and belonged to bacterial secretory proteins. The collected data were validated using the "UniProt" server (http://www.uniprot.org/) according to the experimental evidences. The amino acid sequence of LKADH was retrieved from the UniProt (Table 1). The amino acids in the n-region are boldfaced and the underlined amino acids shows the c-region.
Prediction of signal peptides presence and their cleavage site location: "SignalP 4.1" (http://www.cbs.dtu.dk/services/SignalP/) is an online web server that distinguishes three regions of SPs and their presence probability for target protein based on artificial neural networks (ANNs). SignalP was upgraded to version 4.1 in 2012 with the advent of a cut-off value that was named D-score. D-score is used for the final decision about SPs presence in Nterminus of input amino acid sequences. In this study, if a sequence had a D-score higher than 0.57 was considered as SP. SignalP results for each amino acid sequence are made of three scores based on the neural networks. SPs cleavage sites are determined using C-score (raw cleavage site score). S-score (signal peptide score) distinguishes the sequence of SPs from the target protein sequence and proteins without SPs. Y-score (combined cleavage site score) is the geometric average of the C-score and the slope of the S-score, which differentiate cleavage site prediction better than the raw C score alone [13]. SignalP results were rechecked by "Philius" (http://www.yeastrc.org/philius/). A sequence with type confidence more than 0.5 was considered as signal peptide.
Evaluation of signal peptides physico-chemical properties and solubility: "ProtParam" is an online server at (http://web.expasy.org/), which was employed to predict the different physico-chemical properties of the SPs, including amino acid composition, molecular weight, theoretical pI (isoelectric point), positively charged residues, instability index, aliphatic index, and grand average of hydropathicity (GRAVY). ProtParam evaluates these features based on a protein sequence [14]. SPs instability index separately and connected to LKADH was evaluated. The solubility of SPs and LKADH fusions were predicted using "Protein-sol" online software. Protein-Sol (http://protein-sol.manchester.ac.uk) is a free online web server. Protein-Sol gives a predicted scaled solubility in the 0-1 range to interpret results easily [15]. Unstable fusion proteins and insoluble ones were removed in the next step.
In silico prediction of signal peptides secretion pathway and sub-cellular localization: "PRED-TAT" online server (http://www.compgen.org/tools/PRED-TAT) was used to predict SPs connected LKADH fusions secretion pathway. PRED-TAT differentiates Sec from Tat targeting SPs and predicts their cleavage sites by providing a reliability score in the 0-1 range. The prediction method of the aforementioned server is dependent on Hidden Markov Models (HMMs) and has a standard appropriate architecture for both Sec and Tat SPs [16]. Sub-cellular localization of SPs connected LKADH fusions was evaluated using "ProtCompB" online server (http://www.softberry.com). Its prediction of the localized fusion was based on neural networks, containing the last localization database of homologous proteins. The average accuracy of "ProtCompB" is between 86-100% [17,18].

Evaluation of signal peptides three regions and probability:
The amino acid length for all selected SPs in the n-, h-and c-regions were 2-10, 10-20, and 3-9, respectively. The most critical parameter to identify SPs presence was D-score. If a SP had a D-score higher than 0.57, it was considered as appropriate SP for the target protein. The in silico analysis indicated that the highest D-score belonged to CWBA_BACSU (0.916). QOX2_BACSU (0.217), CHIS_BACSU (0.313), LPP_ECOLI (0.342), BLAC_BACSU (0.377), CDGT_BACST (0.388), BLAT_ECOLX (0.443), GUB_BACAM (0.463), TOLB_ECOLI (0.471), THER_BACST (0.510) and OMPT_ECOLI (0.524) were not suitable SPs for the excretion of LKADH protein, since they had D-scores below 0.57 (Table 2). These signal peptides were removed in the next step. SignalP 4.1 outputs includes several different scores. The C-score and S-score were used for determination of cleavage sites and signal peptides positions, respectively. Y-score indicates the geometric average between the Cscore and a smoothed derivative of the S-score. S-mean is arithmetic average of the S-score from the beginning to position where the Y-score is the max. D-score is the mean of the S-mean and Y-max which determines secretory and non-secretory proteins with cut-off value of 0.5. Sequences with D-score > 0.5 are considered as signal peptide.

Physico-chemical properties and solubility of signal peptides:
Various physico-chemical features of the SPs are shown in Table 3. We chose the SPs length in the range of 18-30. The net positive charge of n-region was 0 for OMPP_ECOLI, 1 for TORT_ECOLI, BLAT_ECOLX, ASPG2_ECOLI, TAUA_ECOLI and PPA_ECOLI, 3 for MALE_ECOLI, LPTA_ECOLI, SACB_BACAM and BLAC_BACSU, 4 for BLAC_BACLI and AMY_BACLI, 5 for SUBF_BACSU and 2 for the other 20 SPs. The grand average of hydropathy (GRAVY) is defined as the sum of hydropathy of amino acids and implemented for total hydropathy comparison. [14]. SUBF_BACSU (0.497) and TORT_ECOLI (2.061) had the lowest and highest GRAVYs. The hydrophobicity value is indicated, using the aliphatic index, related to the aliphatic amino acids (i.e., alanine, valine, isoleucine, and leucine) composition of a protein sequence. [14]. QOX2_BACSU (198.46) had the highest aliphatic index, unlike the lowest one which belonged to BLAC_BACSU (92.22). The instability of the signal peptides alone and in connection with LKADH protein were predicted by the instability index. Having an instability index below 40 indicated the stability of a protein and vice versa. Based on our results, BLAC_BACSU (15.03) in connection with LKADH was the most stable fusion protein. MW (molecular weight), pI (isoelectric point), Instability index, GRAVY (grand average of hydropathicity). Proteins with instability index more than 40 were considered as unstable.

Prediction of secretion pathway and sub-cellular localization:
The results of PRED-TAT web server revealed that all the remaining stable and soluble SPs belonged to the Sec pathway, except QOX2_BACSU that targets the protein to transmembrane segment. These SPs can translocate fused LKADH to different compartments. ProtCompB server sub-cellular localization evaluation, indicating that amongst SPs in this step, LPTA_ECOLI and SACB_BACAM can localize LKADH in periplasmic space, SUBF_BACSU, CHIS_BACSU, CDGT_BACST and AMY_BACLI can translocate this heterologous protein into extracellular space, and other SPs can direct this heterologous protein into the cytoplasm (Table 4).

DISCUSSION
Overexpression of recombinant proteins in the intracellular space of E. coli is usually accompanied with high inclusion body aggregation; hence, it is essential to launch a method for periplasmic or extracellular secretion of proteins [9]. Sec, SRP and TAT are some of protein secretion pathways, which are recruited by prokaryotes. The role of these pathways is to direct proteins into periplasmic space according to their SPs. Thus, choosing a proper SP is a critical step in designing secretory recombinant proteins [11]. The new era in medical and biology has begun with the advent of some other sciences, such as computational biology and bioinformatics. The advantages of using bioinformatics program before launching an experimental study are reducing the costs and increasing the accuracy and validity of the experimental researches [19].
There are many of these bioinformatics online web tools, which can be used to find suitable SPs. A parameter that determines a peptide as a SP is D-score of the signalP server; hence, Dscore is used to sort all SPs in the first step. According to D-scores (Table 2), 31 out of 41 selected SPs were identified as SPs for LKADH, but more features were needed to evaluate a suitable SP.
Some important physico-chemical characteristics of SPs including instability index, GRAVY, net positive charge, h-region length has to be considered for effective protein secretion. A crucial region in a SP is n-region that can confer the ability for translocation to the desired secretory protein. The existence of one or more basic residues causes the n-region to be positively charged. The positive charges facilitate the interaction between SPs and phospholipids, which helps protein to translocate through the membrane [20]. Therefore, any substitution that changes the basic residues with neutral or acidic ones in the signal peptide sequences can reduce the rate of protein synthesis and their secretion [21]. The results of the present study showed a range of 0-5 for positive charges of all SPs. As all the selected SPs are in a suitable range of positively charged residues for n-region, it is also necessary to consider other characteristics for selecting a suitable SP. The h-region is another key region of a SP that its hydrophobic feature has an essential role in membrane targeting and extracellular secretion of proteins [22]. Therefore, hydrophobicity is the most crucial factor for the activity of this region, and the length of h-region is also a determinant of hydrophobicity. Consequently, increasing the length of h-region can raise the level of hydrophobicity, helping to promote the protein secretion rate. Aliphatic index and GRAVY are two major parameters that determine the hydrophobicity, and any increase in these parameters can lead to elevated hydrophobicity [18,22]. As shown in Table 3, all 31 evaluated SPs in this step of the study had high aliphatic indexes, and their hydrophobicity is suitable for secretion. During protein transportation into the extracellular space, signal peptidases cleave the signal peptide sequence in the cleavage site and produce a mature protein product. The cleavage site located in the c-region often has less hydrophobicity, and it includes a signal sequence that is recognized by the signal peptidase. According to the -1, -3 rule a residue with a small neutral side chain like alanine, serine and glycine should be located at -1 and -3 positions. Hence, an Ala-X-Ala box sequence which forms, can be identified and cleaved by signal peptidase [23]. As shown in Table 1, alanine is the most common residue found at the -1 and -3 positions, and the others are approximately similar to AXA box.
Two common translocating pathways in both gram-negative and -positive bacteria that trigger proteins to the extracellular space, are Sec and Tat [24]. In E. coli, about 50% of all total proteins are excreted, with more than 90% secreted via the Sec pathway. Using the Sec pathway, the unfolded proteins can translocate across the membrane and target the extracellular space, either via co-translational (SRP pathway) or post-translational. On the other hand, fully folded proteins get out of the cytoplasm by Tat pathway, a process that uses Tat translocation complex. Folding of proteins in the cytoplasm can result in their aggregation and degradation due to cellular proteases; thus, it seems that Sec and SRP pathways are more suitable for the secretory production of proteins than Tat [25][26][27]. As indicated in Table 4, all SPs in this step were specific for the Sec pathway with reliability scores of more than 0.9. For this reason, based on the analysis none of them were omitted, since we required sub-cellular localization analysis.
At the end, it was determined that amongst 30 stable and soluble LKADH fused SPs, directed toward Sec pathway, 26 SPs were able to translocated to the cytoplasm, and only 4 translocated to LKADH into the periplasmic and extracellular space.
As far as we know, this is the first study aiming to investigate suitable SPs in fusion with LKADH by analyzing their potential effects on the secretion of this protein. It is logical that selecting a suitable and accurate SP for a given protein can reduce the cost and time for production and purification processes of recombinant proteins. This study evaluated 41 diff erent SPs in order to select the most applicable ones for secreting the recombinant LKADH protein out of E. coli host. The results of this work indicated that LPTA_ECOLI, SUBF_BACSU, CHIS_BACSU, SACB_BACAM, CDGT_BACST and AMY_BACLI SPs could be theoretically considered as suitable candidates for the LKADH secretion. However, further experimental investigations should be carried out to validate these results.